Lab 3¶

Univariate Linear Regression on Diabetes Dataset

Imported Data

Out[4]:
array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990749, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06833155, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286131, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04688253,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452873, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00422151,  0.00306441]])
Out[5]:
(442, 10)
Out[6]:
(442,)

Convert Diabetes X and show into table form using pandas

Out[7]:
age sex bmi average_bp S1_cholestrol low_dl high_dl HD ltg glu
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401 -0.002592 0.019907 -0.017646
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412 -0.039493 -0.068332 -0.092204
2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356 -0.002592 0.002861 -0.025930
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038 0.034309 0.022688 -0.009362
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142 -0.002592 -0.031988 -0.046641

Diabetes data head is shown in above figure.

Now change diabetes y into pandas table form and add with x table.

Out[8]:
progression
0 151.0
1 75.0
2 141.0
3 206.0
4 135.0
Out[9]:
age sex bmi average_bp S1_cholestrol low_dl high_dl HD ltg glu progression
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401 -0.002592 0.019907 -0.017646 151.0
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412 -0.039493 -0.068332 -0.092204 75.0
2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356 -0.002592 0.002861 -0.025930 141.0
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038 0.034309 0.022688 -0.009362 206.0
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142 -0.002592 -0.031988 -0.046641 135.0

Visualization using Matplotlib

From the above graph we can see that progression of diabetes is higher for people who have high bmi and also people are who are old.

Univariate Linear Regression Model Building

Linear regression model for Body mass index feature and progression

Coefficients and intercepts

Out[12]:
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
Coefficients:  [981.65543614]
Intercept:  152.28824927379569

Slope is 981.65 which means model has positive correlation and the value where it intercepts is 152.2882

Out[14]:
Actual Predicted
362 321.0 255.174269
249 215.0 211.794626
271 127.0 161.008702
435 64.0 129.267499
400 175.0 196.982065

Presenting the solution

Mean Absolute Error:  52.94370285288119
Mean Squared Error:  4150.6801893299835
Root Mean Squared Error:  64.42577271038341
R2 Score:  0.19057346847560142

In this model as we can see that mean absoulte error is high it means it is not giving accurate predicitions, moreover we have only selected one dependant variable. furthermore root mean squared error also shows that quality of predicitons are not good and they are too far from the true values measured.